# Locating Data in DP2 Using XPath XPath, short for XML Path Language, is a language designed for locating information within XML and HTML documents. It facilitates the navigation through elements and attributes, proving indispensable for precise data retrieval in data extraction tasks. ### Basics of XPath XPath expressions allow for the selection of nodes, including elements, attributes, text, and more. Below are basic XPath expressions and their functions: - `//element`: Selects all nodes named `element` within the document. - `/element`: Targets all `element` nodes directly under the root node. - `element[@attribute]`: Finds all `element` nodes with a specific attribute. - `element[@attribute='value']`: Chooses all `element` nodes where the attribute equals `value`. - `element/text()`: Retrieves the text content of `element` nodes. - `element/child::node()`: Selects the child nodes of `element`. ### Advanced Usage XPath's capabilities extend to using logical operators (`and`, `or`), axes (`ancestor`, `descendant`, `following-sibling`), and functions (`contains()`, `starts-with()`, `not()`) for crafting complex queries. 1. **Logical Operators:** ```xpath //input[@type='submit' or @type='button'] ``` This selects all `input` elements with a `type` attribute of either 'submit' or 'button'. 2. **Using Axes:** ```xpath //div/ancestor::form ``` This expression finds `form` ancestors of `div` elements. 3. **Applying Functions:** ```xpath //h2[contains(text(),'News')] ``` It selects `h2` elements containing the text 'News'. ### Applying XPath in DP2 In DP2, XPath expressions are specified as data selectors for precise data extraction. For instance: ```json { "elements": { "postTitle": { "col": "//div[contains(@class, 'post-title')]/text()", "type": "string" }, "link": { "col": "//a/@href", "type": "string" } } } ``` Here, `postTitle` is configured to extract text from `div` elements with 'post-title' class, and `link` extracts the `href` attribute from all links. Here are some additional practical examples of using XPath to extract specific types of information: ### [Extracting Drug Information in ` detail_step `:](Jexter%20Configuration:Extracting%20Drug%20Information%20in%20'detail_step'.md) - **Select `
` elements within `